Disentangling Chat with Local Coherence Models
نویسندگان
چکیده
We evaluate several popular models of local discourse coherence for domain and task generality by applying them to chat disentanglement. Using experiments on synthetic multiparty conversations, we show that most models transfer well from text to dialogue. Coherence models improve results overall when good parses and topic models are available, and on a constrained task for real chat data. One property of a well-written document is coherence, the way each sentence ts into its context sentences should be interpretable in light of what has come before, and in turn make it possible to interpret what comes after. Models of coherence have primarily been used for text-based generation tasks: ordering units of text for multidocument summarization or inserting new text into an existing article. In general, the corpora used consist of informative writing, and the tasks used for evaluation consider different ways of reordering the same set of textual units. But the theoretical concept of coherence goes beyond both this domain and this task setting and so should coherence models. This paper evaluates a variety of local coherence models on the task of chat disentanglement or threading: separating a transcript of a multiparty interaction into independent conversations. Such simultaneous conversations occur in internet chat rooms, and on shared voice channels such as pushto-talk radio. In these situations, a single, correctly A public implementation is available via https:// bitbucket.org/melsner/browncoherence. disentangled, conversational thread will be coherent, since the speakers involved understand the normal rules of discourse, but the transcript as a whole will not be. Thus, a good model of coherence should be able to disentangle sentences as well as order them. There are several differences between disentanglement and the newswire sentence-ordering tasks typically used to evaluate coherence models. Internet chat comes from a different domain, one where topics vary widely and no reliable syntactic annotations are available. The disentanglement task measures different capabilities of a model, since it compares documents that are not permuted versions of one another. Finally, full disentanglement requires a large-scale search, which is computationally difcult. We move toward disentanglement in stages, carrying out a series of experiments to measure the contribution of each of these factors. As an intermediary between newswire and internet chat, we adopt the SWITCHBOARD (SWBD) corpus. SWBD contains recorded telephone conversations with known topics and hand-annotated parse trees; this allows us to control for the performance of our parser and other informational resources. To compare the two algorithmic settings, we use SWBD for ordering experiments, and also arti cially entangle pairs of telephone dialogues to create synthetic transcripts which we can disentangle. Finally, we present results on actual internet chat corpora. On synthetic SWBD transcripts, local coherence models improve performance considerably over our baseline model, Elsner and Charniak (2008b). On internet chat, we continue to do better on a constrained disentanglement task, though so far, we are unable to apply these improvements to the full task. We suspect that, with better low-level annotation tools for the chat domain and a good way of integrating prior information, our improvements on SWBD could transfer fully to IRC chat.
منابع مشابه
An Optimal Approach to Local and Global Text Coherence Evaluation Combining Entity-based, Graph-based and Entropy-based Approaches
Text coherence evaluation becomes a vital and lovely task in Natural Language Processing subfields, such as text summarization, question answering, text generation and machine translation. Existing methods like entity-based and graph-based models are engaging with nouns and noun phrases change role in sequential sentences within short part of a text. They even have limitations in global coheren...
متن کاملThe Relationship between Local and Global Coherence and Cognitive Processes in Persian-speaking Elderly Population
Objective: Many studies have suggested that there is a relationship between coherence and cognitive processes. This study aims at investigating this hypothesis through assessing the relationship between cognitive variables and coherence in the discourse of two groups of Persian-speaking younger and older adults. Methods: In order to evaluate our participants' cognitive capabilities, we recrui...
متن کاملDisentangling Chat
When multiple conversations occur simultaneously, a listener must decide which conversation each utterance is part of in order to interpret and respond to it appropriately. We refer to this task as disentanglement. We present a corpus of Internet Relay Chat dialogue in which the various conversations have been manually disentangled, and evaluate annotator reliability. We propose a graph-based c...
متن کاملExplicit references in chat-based CSCL: do they facilitate global text processing? evidence from eye movement analyses
Chat-based Computer Supported Collaborative Learning (CSCL) often suffers from limitations due to the communication medium. A frequently reported consequence is the lack of discourse coherence and by this a lack of cognitive coherence in the learning process. To overcome these deficiencies, the implementation of explicit references with chat messages caused higher learning results. We analysed ...
متن کاملOptimal Universal Disentangling Machine for Two Qubit Quantum States
We derive an upper limit for the reduction factor for universal disentangling machine which uses only local operations. Impossibility of constructing a better disentangling machine, by using non-local operations, is discussed.
متن کامل